Supervised Classification

Since results of unsupervised clustering are not quite satisfying and human-created training data is available, supervised classification may be a way to achieve more accurate results.

Using packages “sp” and “rgdal”, shape files can be read which contain geometric shapes (in ths case points) associated with classes. The first line of the following reads the shapes. The labels of each point are then converted to a factor, which contains levels for each label, thereby recognizing the different categories.

trainShapes <- readOGR(dsn="C:\\Users\\D059331\\Desktop\\DM GIC\\data\\shp\\trn", layer="J_treino_QB_Tot_point")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\D059331\Desktop\DM GIC\data\shp\trn", layer: "J_treino_QB_Tot_point"
## with 483173 features
## It has 3 fields
trainFactor <- as.factor(trainShapes$Label)
length(levels(trainFactor))
## [1] 10

As shown in the output, there are 10 different labels in the training set. A color vector of 10 different colors is created in order to plot the different labels on the image.

colors <- c("#CC0000", "#FFD700", "#0CCC0C", "#008B8B", "#191970", "#8A2BE2", "#D8BFD8", "#8B4513", "#000000", "#FF6347")
# Plot the raster
plotRGB(rasterJ,3,2,1)
plot(trainShapes, add=TRUE, col=colors[trainFactor])

Before fitting a model, the raster is divided up into multiple sectors. Some of those sectors can then be used for training and others for validation. The selected sectors are all the same size and were created by moving from the top left corner to the bottom, the the right and diagonally. Approximately 40% of the sectors shall be used for validation. The following code creates the subrasters. Then, the training data is extracted at the location of reach sector and is plotted on top of each sector.

######################## Problem: some have NA's!
trainingExtents <- c(
    extent(-105000, -103000, -42000, -40000),
    extent(-105000, -103000, -44000, -42000),
    extent(-105000, -103000, -46000, -44000),
    extent(-103000, -101000, -42000, -40000),
    extent(-103000, -101000, -44000, -42000),
    extent(-103000, -101000, -50000, -48000),
    extent(-101000, -99000, -42000, -40000),
    extent(-101000, -99000, -48000, -46000),
    extent(-101000, -99000, -50000, -48000),
    extent(-99000, -97000, -46000, -44000),
    extent(-99000, -97000, -48000, -46000),
    extent(-99000, -97000, -50000, -48000),
    extent(-97000, -95000, -44000, -42000),
    extent(-97000, -95000, -46000, -44000),
    extent(-97000, -95000, -48000, -46000)
)
# Shapes with classes used as training data
trainingShapesAt <- lapply( rep("SpatialPointsDataFrame", length(trainingExtents)), new )
# Full sectors used for plotting
trainingSectors <- lapply( rep("RasterBrick", length(trainingExtents)), new )
# Extracted values from sector at location of available training points
trainingSectorValues <- c()
for(i in 1:length(trainingExtents)) {
    trainingShapesAt[[i]] <- crop(trainShapes, trainingExtents[[i]])
    trainingSectors[[i]] <- crop(rasterJ, trainingExtents[[i]])
    trainingSectorValues[[i]] <- extract(trainingSectors[[i]], trainingShapesAt[[i]])
    plotRGB(trainingSectors[[i]],3,2,1)
    plot(trainingShapesAt[[i]], add=TRUE, col=colors[trainFactor])
}
## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster
## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

## Warning in .local(x, y, ...): Transforming SpatialPoints to the CRS of the
## Raster

validationExtents <- c(
    extent(-105000, -103000, -48000, -46000),
    extent(-105000, -103000, -50000, -48000),
    extent(-103000, -101000, -46000, -44000),
    extent(-103000, -101000, -48000, -46000),
    extent(-101000, -99000, -44000, -42000),
    extent(-101000, -99000, -46000, -44000),
    extent(-99000, -97000, -42000, -40000),
    extent(-99000, -97000, -44000, -42000),
    extent(-97000, -95000, -42000, -40000),
    extent(-97000, -95000, -50000, -48000)
)
validationShapesAt <- lapply( rep("SpatialPointsDataFrame", length(validationExtents)), new )
validationSectors <- lapply( rep("RasterBrick", length(validationExtents)), new )
for(i in 1:length(validationExtents)) {
    validationShapesAt[[i]] <- crop(trainShapes, validationExtents[[i]])
    validationSectors[[i]] <- crop(rasterJ, validationExtents[[i]])
    plotRGB(validationSectors[[i]],3,2,1)
    plot(validationShapesAt[[i]], add=TRUE, col=colors[trainFactor], pch=".")
}

The next step is to fit a model by training with one sector in the training set. This study uses the random forest classifier.

trainPos <- 1
trainDataFrame <- data.frame(trainingSectorValues[[trainPos]], trainingShapesAt[[trainPos]])
# Structure of the Data Frame:
head(trainDataFrame, 5)
##    J_04AUG14112729.M2AS.000000137917_01_P001_etrs89.1
## 95                                                 35
## 96                                                 34
## 97                                                 34
## 98                                                 34
## 99                                                 34
##    J_04AUG14112729.M2AS.000000137917_01_P001_etrs89.2
## 95                                                 51
## 96                                                 49
## 97                                                 50
## 98                                                 50
## 99                                                 50
##    J_04AUG14112729.M2AS.000000137917_01_P001_etrs89.3
## 95                                                 39
## 96                                                 37
## 97                                                 37
## 98                                                 39
## 99                                                 39
##    J_04AUG14112729.M2AS.000000137917_01_P001_etrs89.4 POINTID GRID_CODE
## 95                                                 50      95        34
## 96                                                 49      96        34
## 97                                                 50      97        34
## 98                                                 52      98        34
## 99                                                 50      99        34
##    Label coords.x1 coords.x2 optional
## 95    34 -104246.9 -40000.35     TRUE
## 96    34 -104244.5 -40000.35     TRUE
## 97    34 -104242.1 -40000.35     TRUE
## 98    34 -104239.7 -40000.35     TRUE
## 99    34 -104237.3 -40000.35     TRUE
forest <- randomForest(
    x=trainDataFrame[,1:4],
    y=as.factor(trainingShapesAt[[trainPos]]$Label)
)

The above code trained the classifier with only one training sector. This classifier can be used to predict labels for new data. In the following, the random forest model is used to predict the sector it has been trained with and plots the prediction results.

prediction <- predict(trainingSectors[[trainPos]], forest, type="response", na.rm=TRUE)
plotRGB(trainingSectors[[trainPos]], 3, 2, 1)

plot(prediction)

Just by observing the picture it is apparent that the model somewhat recognizes the landscape. Since there are correct values available for some points (that is, our training data) we can calculate the error rate on those points. The error can be calculated by counting the missclassifications and then dividing that number by the total number of training points available in the sector.

calculateError <- function(prediction, actual) {
    prediction <- extract(prediction, actual)
    trainDiff <- prediction - actual
    trainDiffCount <- 0
    for(i in 1:length(prediction)) {
        if(trainDiff[i] != 0) {
            trainDiffCount <- trainDiffCount + 1
        }
    }
    return(trainDiffCount  / length(trainDiff))
}
calculateError(prediction, trainingShapesAt[[trainPos]]$Label)
## [1] 0.06054854